Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation

نویسندگان

Sylvain Arlot

Matthieu Lerasle

چکیده

This paper studies V -fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares loss of the selected estimator. We first prove a non-asymptotic oracle inequality for V -fold cross-validation and its bias-corrected version (V -fold penalization). In particular, this result implies that V -fold penalization is asymptotically optimal in the nonparametric case. Then, we compute the variance of V -fold cross-validation and related criteria, as well as the variance of key quantities for model selection performance. We show that these variances depend on V like 1 + 4/(V − 1), at least in some particular cases, suggesting that the performance increases much from V = 2 to V = 5 or 10, and then is almost constant. Overall, this can explain the common advice to take V = 5 —at least in our setting and when the computational power is limited—, as supported by some simulation experiments. An oracle inequality and exact formulas for the variance are also proved for Monte-Carlo cross-validation, also known as repeated cross-validation, where the parameter V is replaced by the number B of random splits of the data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

V -fold Cross-validation and V -fold Penalization in Least-squares Density Estimation

متن کامل

Appendix to the Article “ Choice of V for V - Fold Cross - Validation in Least - Squares Density Estimation ”

This appendix is organized as follows. The first section (called Section B, for consistency with the numbering of the article) gives complementary computations of variances. Then, results concerning hold-out penalization are detailed in Section D, with the proof of the oracle inequality stated in Section 8.2 (Theorem 12) and an exact computation of the variance. Section E provides complements o...

متن کامل

Robust Cross-Validation Score Functions with Application to Weighted Least Squares Support Vector Machine Function Estimation

In this paper new robust methods for tuning regularization parameters or other tuning parameters of a learning process for non-linear function estimation are proposed: repeated robust cross-validation score functions (repeated-CV Robust V −fold) and a robust generalized cross-validation score function (GCVRobust). Both methods are effective for dealing with outliers and non-Gaussian noise distr...

متن کامل

Semiparametric multivariate density estimation for positive data using copulas

In this paper we estimate density functions for positive multivariate data. We propose a semiparametric approach. The estimator combines gamma kernels or local linear kernels, also called boundary kernels, for the estimation of the marginal densities with parametric copulas to model the dependence. This semiparametric approach is robust both to the well-known boundary bias problem and the curse...

متن کامل

@bullet a Comparison of Cross-validation Techniques in Density Estimation! (comparison in Density Estimation)

• • ~~~~~~ In the setting of nonparametric multivariate density estimation, theorems are established which allow a comparison of the Kullback-Leibler and the Least Squares cross-validation methods of smoothing parameter selection. The family of delta sequence estimators (including kernel, orthogonal series, histogram and histospline estimators) is considered. These theorems also show that eithe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Journal of Machine Learning Research

دوره 17 شماره

صفحات -

تاریخ انتشار 2016

Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation

نویسندگان

چکیده

منابع مشابه

V -fold Cross-validation and V -fold Penalization in Least-squares Density Estimation

Appendix to the Article “ Choice of V for V - Fold Cross - Validation in Least - Squares Density Estimation ”

Robust Cross-Validation Score Functions with Application to Weighted Least Squares Support Vector Machine Function Estimation

Semiparametric multivariate density estimation for positive data using copulas

@bullet a Comparison of Cross-validation Techniques in Density Estimation! (comparison in Density Estimation)

عنوان ژورنال:

اشتراک گذاری